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Abstract 

Fault isolation and sensor placement are vi- 
tal for monitoring and diagnosis. A sensor 
conveys information about a system’s state 
that guides troubleshooting if problems arise. 
We are using machine learning methods to 
uncover behavioral patterns over snapshots 
of system simulations that will aid fault iso- 
lation and sensor placement, with an eye to- 
wards minimality, fault coverage, and noise 
tolerance. 

1 Introduction 

Accurate and timely fault diagnosis is crit- 
ical in the life cycle of many physical sys- 
tems. Seemingly minor faults can, if un- 
remedied, lead to catastrophic faults that 
disable a system permanently. To iden- 
tify faults, (human or machine) diagnosti- 
cians observe the system’s behavior primar- 
ily through sensor readings. Sensors should 
generally be selected to be maximally infor- 
mative about the state of the system. In the 
best of all possible worlds, we might expect 


that sensors should be placed on all measur- 
able quantities of a system; anomalous val- 
ues on one or more sensors could then read- 
ily identify the presence of and help isolate 
system faults. However, costs are associated 
with sensors. These costs correspond to ac- 
tual monetary cost as well as costs due to 
the physical design constraints of the sys- 
tem such as power, mass, and volume which 
are at a high premium in systems such as 
Space Station Freedom. In addition, in- 
creased numbers of sensors introduce more 
information that an operator must attend 
to; too many sensors can lead to informa- 
tion overload, thus actually contributing to 
a degradation in (human) diagnostic perfor- 
mance. 

In many cases it is neither feasible nor de- 
sirable to measure all quantities of a system. 
Thus, the diagnostician must interact with 
the system in two other ways: probing and 
testing. One can think of probing as sens- 
ing a quantity dynamically to determine its 
value at a particular point in time. In test- 
ing we examine component output quanti- 
ties while systematically varying its inputs. 
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Probing and testing increase the cost (e.g., 
time) of diagnosis and may even be impos- 
sible on remote systems such as unmanned 
spacecraft. Moreover, probing and testing 
are only initiated when there is some indi- 
cation of a fault. Thus, we would like to ju- 
diciously place sensors so that they indicate 
the existence of faults and focus attention 
on their plausible causes. 

Sensor placement is the task of determin- 
ing a set of sensors which allows the most ac- 
curate determination of the overall state of a 
monitored system while minimizing costs re- 
lating to the number of sensors, power con- 
sumption, cost, and weight. Reducing these 
quantities is particularly important in space 
platforms due to power and space restric- 
tions. In response, we are using two ma- 
chine learning methods to identify categories 
of system behavior that are similar in terms 
of measurable quantities. In this paper we 
describe the specific methods used and ana- 
lyze their results. As we will illustrate, these 
results can be exploited for purposes of diag- 
nosis and design for diagnosability, notably 
sensor placement. 

We describe a methodology for applying 
inductive learning systems to the discovery 
of ‘rule bases 5 for diagnosis. Our primary 
reason for doing so is to facilitate system de- 
sign. In particular, rules suggest measurable 
quantities that are most diagnostic. Given a 
suitable tradeoff between coverage, accuracy 
and sensor cost, we envision a tool that aids 
system designers in sensor selection. We are 
currently in the process of systematically ex- 
ploring the interaction between these factors 
in the context of two learning systems, Quin- 
lan’s C4.5 [13] and Fisher’s COBWEB [6], 
with a longer-term goal of developing objec- 
tive function(s) that reflect such a tradeoff. 


2 Supervised Learning 
Approach 

Supervised learning systems discover rules 
that characterize preclassified observations. 
For example, supervised machine learning 
systems arc used in medical diagnosis; given 
patient case histories that record features 
such as gender, age, aspects of medical his- 
tory, and a variety of test results, as well 
as a diagnosis provided by a physician, a 
supervised system discovers rules that are 
consistent with the physician-supplied diag- 
noses. We can also use this technology for 
purposes of fault diagnosis. In particular, 
consider the model of a thermal subsystem 
given in Figure 1. 

We have used the following strategy to 
learn rules that distinguish a variety of con- 
ditions that can cause anomalous behavior 
in this system. 

[1] Specify a simulator that represents each 
major system component as a func- 
tion that maps component inputs to 
outputs. Simulation using a model- 
based methodology similar to Kuipers’ 
[10] begins with an initial state of sys- 
tem parameter settings and propagates 
parameter changes through component 
functions until the simulator converges 
on a steady state. 

[ 2 ] Associated with each system component 
are permissible parameter (continuous 
and discrete) ranges, within which the 
component is assumed to operate sat- 
isfactorily. Initial simulator parame- 
ters are systematically perturbed be- 
yond extreme ends of these ranges for 
each component, thus yielding condi- 
tions under which the system is liable 
to malfunction. 
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electric fans 


Figure 1: A thermal model. 


[ 3 ] Each condition set generated in step [2] 
is propagated through the system un- 
til a steady state (or some error con- 
dition) is reached. A database record 
(which consists of measurements from 
each observable parameter in the sys- 
tem, labeled by the initial perturbed 
condition) is generated. 

[ 4 ] The system state descriptions of all 
simulations are collected together and 
passed to a supervised learning system. 

[5] The learning system forms a decision 
tree, then extracts rules that distin- 
guish anomalous behaviors that were 
caused by different parameter pertur- 
bations. 

We have used a supervised learning sys- 
tem known as C4.5 to form a diagnostic 


rule base. C4.5 has separate programs that 
(1) construct a decision tree and (2) form 
a rule base. In particular, C4.5 was used 
to discriminate the system perturbations 
(‘faults’) generated in step [2] of the sim- 
ulation/learning procedure outlined above. 
Our thermal model contained a total of 87 
fault types. In addition, three versions of 
each perturbation type were generated, cor- 
responding to cases where the selected pa- 
rameter value was perturbed just above (or 
below) acceptable ranges, moderately out of 
range, and far out of range. Intuitively, 
these corresponded to conditions of high 
(low), very high (low), and extremely high 
(low) values, but each case was labeled by a 
single fault (e.g., the parameter was ‘above 
acceptable range’). Thus, the decision tree 


49 




had to distinguish 87 ‘faults’, derived from 
over 261 observation sets (snapshots). Each 
snapshot was represented by 23 system pa- 
rameter values. Using C4.5, we constructed 
decision trees much like the one partially 
shown in Figure 2. 



Initially, we are interested in two items: 
(1) the diagnostic accuracy of this tree, if we 
insist that faults must be perfectly isolated, 
and (2) how much the tree ‘compresses’ the 
parameters needed to attain a desired accu- 
racy. We call this second factor the param- 
eter compression ratio. 

In this example, the decision tree cor- 
rectly and uniquely classified 73% of the 
snapshots over which it was constructed. 
Note that the failure to perfectly classify 
all known behaviors is the result of C4.5’s 
information-theoretic measure which could 
not reliably distinguish certain behaviors 
with the existing observable parameter val- 
ues. These points of ambiguity are precisely 
where system designers should focus sensor 
placement efforts in order to better distin- 
guish faults. It required that approximately 
18 of the 23 parameters be consulted in or- 
der to achieve this accuracy - a parameter 


compression ratio of (23 — 18)/23 or 0.22. 

The statistics above reflect a bias that 
the decision tree (or any rule-based system 
for that matter) should not attempt to per- 
fectly isolate a fault. However, we can re- 
lax the diagnostic task, and allow catego- 
rization to identify an observation’s fault 
as one of a small number of possibilities. 
The tree above will correctly identify each 
observation as exhibiting 1 of at most 3 
fault possibilities (pump-speed-low, valvel- 
pos-low, valvel-pos-high) in 100% of the 
cases. Thus, we are are interested in the 
degree to which the tree isolates a fault. In 
this case, our minimal fault compression ra- 
tio is (87 — 3)/87 or 0.97. 

Three aspects of this inductive analysis 
are of interest. Each of these speaks to 
the success of the diagnostic task, and pro- 
vides guidelines for fault isolation and sensor 
placement. Our particular concern in this 
latter regard is with sensor placement. 

• The fault compression ratio tells us the 
degree to which a behavior’s fault can 
be isolated using the rule base. In- 
versely, it is a measure of the extent 
that we will have to rely on other 
sources of knowledge and diagnostic 
procedures, such as an expert or system 
simulation in conjunction with model- 
based diagnosis, to discriminate the 
fault from the reduced set of possibil- 
ities. 

• The parameter compression ratio indi- 
cates the proportion of system param- 
eters that need to be accessed for di- 
agnosis over a population of behaviors. 
This is a guide to the number of sensors 
that will be required if diagnosis relies 
simply on sensor values. 

• The diagnostic accuracy in a system is 
the percentage of behaviors that are 
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correctly categorized as one of several 
possibilities. It measures the reliabil- 
ity of diagnosis within the rule base, 
whereas fault compression measures the 
granularity. 

These factors are, of course, interdepen- 
dent. For example, decreasing allowable 
fault compression (undesirable) will tend to 
increase the required parameter compres- 
sion (desirable), and increase diagnostic ac- 
curacy (desirable). In general, we cannot 
hope to optimize each of these parameters. 
Rather, design and sensor placement must 
optimize some tradeoff between them. For 
example, if accuracy is at a premium, then 
we may have to accept an decrease in fault 
compression. This implies a corresponding 
(but desirable) increase in parameter com- 
pression, and an expected decrease in sen- 
sor ‘cost’ as well. However, the undesir- 
able decrease in fault compression implies 
that diagnostic cost will increase from hav- 
ing to employ secondary diagnostic proce- 
dures such as probing, testing, and simula- 
tion to a larger extent. 

We are initiating systematic experiments 
across the range of diagnostic factors, with 
the eventual goal of defining an objective 
function that characterizes an appropriate 
tradeoff between them. Such a function 
will allow us to bound certain factors (e.g. 
accuracy, parameter compression or sensor 
‘cost’) and to optimize for the remaining 
factors (e.g., fault compression). Our cur- 
rent version of C4.5 builds a decision tree 
based on the diagnosticity of system param- 
eter values. Other variations that take into 
account the cost of sensing certain values 
have also been developed by Tan & Schlim- 
mer [15]. 

A decision tree representation of a rule 
base is conceptually simple, and it has the 
desirable aspect of encoding the ‘minimal 1 


number of system measurements needed to 
isolate faults to a certain granularity. How- 
ever, it also has some well-known disadvan- 
tages. Notably, a decision tree is very sen- 
sitive to noise in sensed system values (or 
faulty sensors, which we regard as another 
type of noise): a single misleading value can 
lead diagnosis considerably astray. One im- 
plication is that the minimality characteris- 
tic of decision trees may not be wholly de- 
sirable; uncertainty in a domain ma.y insist 
on some redundancy in the sensed values, in 
order to better protect against the possibil- 
ity of noise. Thus, in addition to our studies 
with C4.5, we are also investigating a second 
inductive approach known as clustering. 

3 Cluster-Analytic Ap- 
proach 

A data analyst must often identify sim- 
ilarities and differences between observa- 
tions. For example, a biologist will cate- 
gorize a newly discovered organism into a 
known genera based on its similarities with 
known species of the class and differences 
with members of competing genera. An 
economist may recognize a trend in the mar- 
ket as having occurred previously, and fore- 
cast a particular outcome based on these his- 
torical similarities. The need to ‘cluster’ ob- 
servations is critical in many fields, includ- 
ing the biological and social sciences, where 
it has spawned data analysis tools of numer- 
ical taxonomy or cluster analysis (e.g., Jain 
h Dubes [8]). Clustering methods have also 
evolved in artificial intelligence (AI) and ma- 
chine learning (e.g., Michalski h Stepp[l 1]). 

Clustering systems automatically discover 
categories of observations (events or objects) 
that are similar along some dimension^;). 
Once uncovered, these categories may sug- 
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gest features that characterize the observed 
data and/or facilitate predictions about the 
nature of future data. As in scientific en- 
deavors, engineering disciplines can profit 
from clustering. For example, in diagnosis 
an observation may be a set of symptoms 
that collectively indicate a class of events 
that share a common diagnosis. We believe 
that discovered dusters can be used dynam- 
ically for automated diagnosis, and that like 
a data analyst, a system designer can use 
clusters over simulated behavior to facilitate 
design - in this case sensor placement. 

3.1 Cobweb: A sample clus- 
tering system 

A clustering system constructs a classifica- 
tion scheme over a set of observations. Fig- 
ure 3 illustrates a classification tree con- 
structed over five observations by a clus- 
tering system called COBWEB. Each node 
(class) in this tree represents a cluster of 
observations. Each cluster is represented 
by the distribution of attribute values over 
members of that node; this illustrative ex- 
ample assumes that observations are rep- 
resented by attributes of Size (small, 
medium, large), Shape (square, sphere, 
pyramid), and Color (blue, green, red). 
Each leaf of the tree represents a cate- 
gory covering a single observation; the prob- 
ability of each value in a leaf, P(A, = 
V^|leaffc), is 1.0 (i.e., present in the cor- 
responding observation) or 0.0 (i.e., absent, 
in which case it is not explicitly stored at 
the node). The root of the tree covers all 
observations, with base rate probabilities 
P(Ai = Enroot) that reflect global value 
distributions. In general, each node, C k , 
contains probabilities, P(A{ = Kj| C*,), for 
each attribute value observed in a member 
of the node. In addition, the proportion of 


observations stored under each node relative 
to the node’s parent is stored with the node. 
For example, forty percent of the observa- 
tions stored under the root are stored under 
node C\\ P(Ci|root) = 0.4. 

We will not describe the strategy used to 
build this categorization hierarchy over ob- 
servations since it is of limited relevance in 
future discussion, and any of several strate- 
gies can be used. However, it is important 
to note that every clustering system relies on 
a measure of cluster quality. In Cobweb’s 
case this is a measure of category utility de- 
rived from Gluck & Corter [3]: 

CU(C k ) = P(C'jt)x 

P(A t = E tJ |Cfc) log 2 P(A, = Ky| C k ) 
-P(Ai = V i3 ) log 2 P(Ai = Ky)], 

which rewards clusters that increase the cer- 
tainty inherent in the attribute value dis- 
tributions. The expression above is appro- 
priate for nominally-valued (i.e., discrete, 
unordered, finite) attributes, but several 
variations on this basic scheme (Gennari, 
Langley, & Fisher [7]; Reich & Fenves[14]) 
have been adapted to handle observations 
described over ordinal and continuously- 
valued attributes as well. The certainty- 
maximizing measure is used recursively, first 
to build a partition over the entire popula- 
tion of observations, and then to subparti- 
tion each of these initially-constructed clus- 
ters, thus yielding a categorization hierar- 
chy. Our particular interest in this process 
is its ability to discover clusters over snap- 
shots or instantaneous descriptions of sys- 
tem simulations. 

3.2 Discovering Fault Modes 

We use Cobweb to discover categories of 
fault conditions over system simulations. 
This proceeds in much the same way as 
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Figure 3: A classification tree constructed by COBWEB. 


the simulation/induction procedure of Sec- 
tion 2, except that in Step [4], the snapshots 
are passed to our clustering system rather 
than a supervised one. An example of a 
categorization tree of discovered fault modes 
for the thermal system is partially shown in 
Figure 4. Each datum consists of inputs 
and outputs, for all components, including 
the single perturbed value (as described in 
step [2]); that is, each datum is a snapshot 
of the system. We do not show the proba- 
bility distributions over all attribute values 
for clusters, but simply label each low-level 
node by a descriptor that conveys the fault- 
mode meaning. Thus, low flow through the 
radiator and a malfunction to the heater it- 
self both result in high water temperatures 
(Example 1), despite the fact that this be- 
havior emerges for very different reasons. 
Similarly, high flow through the pump ap- 
pears somewhat similar to a second heater 
malfunction: both result in low water tem- 


peratures (Example 2). 

As with C4.5, the benefits of clustering 
are at least two-fold. First, it is difficult 
for engineers to completely design against 
system faults in advance. Collectively, sim- 
ulation and clustering identify fault models 
that benefit design decision making. For ex- 
ample, a faulty heater may overheat water in 
the thermal system, but this behavior may 
appear to be similar to, and thus be clus- 
tered with, a radiator (heat exchanger) that 
does not sufficiently cool water. Second, as 
with C4.5, these ambiguities can alert ana- 
lysts to place sensors that better distinguish 
these conditions. 

Again like C4.5, a Cobweb classifica- 
tion tree can also facilitate fault diagnosis. 
In particular, categories discovered through 
clustering associate observablc/sensor/test 
features with component faults that lead to 
the observed anomalies. We wish to clas- 
sify an observable set of sensor readings to a 
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level of the classification tree where a rea- 
sonably certain prediction of the underly- 
ing fault can be made. However, a cat- 
egorization and diagnosis procedure is less 
clear with a COBWEB generated tree, since 
it does not specify a single value that should 
be sensed at any particular point as a deci- 
sion tree does. Rather, we can exploit char- 
acteristic attribute values of discovered cat- 
egories to direct sensor testing. There are 
a number of ways for identifying character- 
istic (or normative) values, as described in 
Fisher[6] and Reich & Fenves[14], but suffice 
it to say that they are values that are typ- 
ically true of category members, and typ- 
ically discriminate the category’s members 
from other, contrasting categories. Charac- 
teristic values suggest tests that are likely 
to discriminate the most promising paths of 
the tree during classification: verification of 
a characteristic value(s) suggests that the 
associated path be followed, thus narrow- 
ing the plausible faults that are consistent 
with the known observables; failure to ob- 
serve the expected value reduces the likeli- 
hood that the associated path will lead to a 
correct diagnosis. 


The primary advantage of this strategy 
over C4.5 is that the categorization tree 
formed through clustering specifies a num- 
ber of values at each node of the tree that 
can be sensed in order to guide further cate- 
gorization or diagnosis. The decision tree 
structure is not generally as robust when 
certain values cannot be reliably sensed be- 
cause of noise. In contrast, the increased in- 
formation redundancy of the COBWEB tree 
is more robust in the face of noise, but re- 
dundancy also comes with the correspond- 
ing disadvantage that parameter compres- 
sion is correspondingly lower. 


4 Attention Focusing 

Consider the space between the decision tree 
approach and the conceptual clustering ap- 
proach as a continuum on feature structure. 
In decision trees the structure is fixed during 
training so that the order for feature testing 
during prediction is rigid. There is one fea- 
ture test at each node with leads to a node 
at a deeper level (and another test). 

In conceptual clustering there is no fea- 
ture structure. To determine how to branch 
into the concept hierarchy, one must test ev- 
ery feature in the current node. In some 
cases this could lead to a significant number 
of tests (e.g., in our domain example from 
Section 2). 

Optimally, we would like to classify an 
object or event in as few tests as possi- 
ble with as few branches as possible. The 
decision tree approach would seem to have 
a tremendous advantage in classification of 
problems with highly independent feature 
spaces. However, when in a feature space 
with specific dependencies, it would be nice 
to cluster tests over these dependencies and 
branch deeper into the tree with fewer tests. 
One way in which we accomplish this is to 
examine the salience of each feature within 
each node, calculating what amounts to a 
category utility for each feature within the 
scope of its parent node. 

The order of inspection for features in 
each node is then relative to its salience. 
The salience for a feature can be computed 
in any number of ways. In the equation be- 
low we show a general method for calculat- 
ing salience based on standard deviation. 

n P(C„)+- - ± 

sahencei = — — 

where K is the number of classes, P(Ck) 
is the probability of a particular class, and 
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Figure 4: A partial classification tree of fault modes for the thermal model. 


a t j is the standard deviation of the feature 
within class k. 

Using the notion of salience, an algorithm 
can be derived that focuses attention on 
the most informative features to test before 
branching into a behavior hierarchy. The 
following describes our algorithm for atten- 
tion: 

1. Select an unseen feature with probabil- 
ity based on salience scores stored at 
the parent. 

2. Compute the salience of the selected 
feature; store this new score at the par- 
ent. 

3. Compare the category utility score for 
the best classification, x, based only on 
features inspected so far. 

4. Consider all remaining unseen features; 


if these were to match the second best 
classification, would the score be better 
than xl 

5. If yes, goto step [1], otherwise ignore 
remaining attributes and branch to new 
node. 

A problem closely associated with the cal- 
culation of feature salience is the selection 
of parametric measurements to ensure com- 
plete and cost-effective diagnosis. In ana- 
lyzing a design for fault isolation we exam- 
ine several additional factors, or properties, 
that belong to the device used for sensing a 
particular feature. A partial list of factors 
governing sensor selection follows: 

So, when looking at which salient features 
to actually measure, an objective equation 
to minimize cost and maximize feature cov- 
erage must be designed. Below we offer a 
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* response time 

• maintainability 

• launch weight 

• I/O performance 

• criticality 

• power consumption 

• reliability 

• procurement price 

• repeatability 

• number of sensors 

• accuracy 

• operating temperature 

• resolution 

• operating pressure 


Table 1: Factors for sensor selection. 


general form for such an objective equation: 
min ^2 w rfi 

where T,i w i = T fi G {/i, /n} 

are n sensor factors, and // =|| fi || is a nor- 
malized value representing the sensor factor 
within some range. 

The following algorithm can be used for 
selecting which salient features to measure 
in a system under design. 

1. Set threshold for objective equation. 

2. Apply objective equation. 

3. Collect sensor recommendations. 

4. If parameter compression and fault 
compression (from decision tree analy- 
sis) are exceeded, then adjust threshold; 
goto root-node and restart. Otherwise 
branch and goto step [2]. 

5 Related Work 

Work currently underway at JPL comple- 
ments our research. JPL’s AT Group has 
identified numerous factors that influence 
optimal sensor placement in Chien, Doyle, 
&; de Mello[l], Chien, Doyle, & Rouqette[2], 
and Doyle & Fayyad[5]. Among these are 
factors that relate to the diagnosticity of 
sensors - i.e., the ability of sensed system 


quantities to predict the presence and lo- 
cation of faults. Roughly, diagnosticity is 
measured by simulating a fault on a system 
model, and then observing the changes to 
various model quantities. Quantities that 
differ most relative to their normal state 
(and possibly their value during other, com- 
peting fault conditions), are judged good 
predictors of that particular fault. In gen- 
eral, the approach makes pairwise compar- 
isons between the same quantities under 
two different fault modes, and two different 
quantities under identical fault conditions. 
The approach appears to be generally help- 
ful, but the utility of pairwise comparisons 
is limited. In contrast, our two learning ap- 
proaches seek patterns or rules across mul- 
tiple dimensions (i.e., multiple fault modes, 
and multiple sensed quantities) of system 
behavioral snapshots simultaneously. This 
approach can provide a more global perspec- 
tive on system behavior, and makes certain 
multidimensional patterns explicit to the de- 
signer. 


Furthermore, our approach to sensor 
placement is guided by an explicit model 
of the diagnostic process. This top-down 
approach contrasts with JPL’s bottom-up 
approach, which is primarily responsible 
for enumerating a wider variety of fac- 
tors that play a role in sensor placement. 
Our primary focus on a single aspect (i.e., 
information-content) of system parameter 
values that might act as good sensors is 
a disadvantage of our approach relative to 
JPL’s. However, we view the two ap- 
proaches as complementary, and are pursu- 
ing links between them. 
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6 Concluding Remarks 

Our approach to sensor selection is distin- 
guished from others in that it is guided by 
an explicit model of diagnosis; this top-down 
methodology promises principled criteria for 
sensor placement. Although our models of 
diagnosis are primarily useful for design, the 
rule bases developed through clustering and 
supervised methods could be used directly 
for diagnosis as well - either autonomously 
or by a human user. In this, we recognize the 
importance of both rule-based and model- 
based approaches as contrasted in Keller[9] 
and Davis[4]. Our bias is that inductive ap- 
proaches can never replace model-based ap- 
proaches in any but the most trivial of ap- 
plications. As Keller points out, ‘compiled’ 
knowledge is most helpful in diagnosing rel- 
atively routine faults. To attempt a rule- 
based approach that covers idiosyncratic 
faults as well (i.e., achieves very high fault 
compression) invites ‘overfitting’ (i.e., unac- 
ceptably low accuracy and/or unacceptably 
low parameter compression). The overfit- 
ting phenomenon is well-known in machine 
learning, but inductive approaches to com- 
pilation for diagnosis have not traditionally 
addressed the issue, as shown in Pearce[12]. 
Rather, an ideal tradeoff between coverage, 
cost, and accuracy must only assume that a 
certain diagnostic burden is taken on by the 
compiled rule base. Our primary goal is to 
limit, but not eliminate, the space of faults 
that need be explored by probing, testing, 
and simulation. 
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